12 research outputs found
Privacy- and Utility-Preserving NLP with Anonymized Data: A case study of Pseudonymization
This work investigates the effectiveness of different pseudonymization
techniques, ranging from rule-based substitutions to using pre-trained Large
Language Models (LLMs), on a variety of datasets and models used for two widely
used NLP tasks: text classification and summarization. Our work provides
crucial insights into the gaps between original and anonymized data (focusing
on the pseudonymization technique) and model quality and fosters future
research into higher-quality anonymization techniques to better balance the
trade-offs between data protection and utility preservation. We make our code,
pseudonymized datasets, and downstream models publicly availableComment: 10 pages. Accepted for TrustNLP workshop at ACL202
Negative Barnett effect, negative moment of inertia of (quark-)gluon plasma and thermal evaporation of chromomagnetic condensate
We discuss the negativity of the moment of inertia of (quark-)gluon plasma in
a window of ``supervortical'' range of temperatures above the deconfining phase
transition, found recently in numerical Monte
Carlo simulations by two independent methods. In our work, we confirm
numerically that the origin of this effect is rooted in the thermal evaporation
of the non-perturbative chromomagnetic condensate. We argue that the negative
moment of inertia of gluon plasma indicates the presence of a novel effect, the
negative spin-vortical coupling for gluons resulting in a negative gluonic
Barnett effect: the spin polarization of gluons exceeds the total angular
momentum of rotating plasma thus forcing the orbital angular momentum to take
negative values in the supervortical range of temperatures.Comment: 9 pages, 3 figure
Improving Automatic Categorization of Technical vs. Laymen Medical Words using FastText Word Embeddings
International audienceDetection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. In this paper, we study usage of recently developed word embeddings, which contain context information for words together with other linguistic and non-linguistic features, for improving the detection of difficult medical words. We propose new cross-validation scenarios in order to test the generalization ability of the medical words difficulty detection from different perspectives and provide the experimental study of previously used methods for feature extraction together with recently proposed FastText embeddings. We found that for known words and unknown users FastText embeddings surely improves the detection of word understandability reaching 85.9 F-score (up to 2.9 F-score improvement)
Generalizability of readability models for medical terms
International audienceDetection of difficult for understanding words is a crucial task for ensuring the proper understanding of medical texts such as diagnoses and drug instructions. We propose to combine supervised machine learning algorithms using various features with word embeddings which contain context information of words. Data in French are manually cross-annotated by seven annotators. On the basis of these data, we propose cross-validation scenarios in order to test the generalization ability of models to detect the difficulty of medical words. On data provided by seven annotators, we show that the models are generalizable from one annotator to another
RNN embeddings for identifying difficult to understand medical words
International audiencePatients and their families often require a better understanding of medical information provided by doctors. We currently address this issue by improving the identification of difficult to understand medical words. We introduce novel embeddings received from RNN - FrnnMUTE (French RNN Medical Understandability Text Embeddings) which allow to reach up to 87.0 F1 score in identification of difficult words. We also note that adding pre-trained FastText word embeddings to the feature set substantially improves the performance of the model which classifies words ac- cording to their difficulty. We study the generalizability of different models through three cross-validation scenarios which allow testing classifiers in real-world conditions: understanding of medical words by new users, and classification of new unseen words by the automatic models. The RNN - FrnnMUTE embeddings and the categorization code are being made available for the research
Negative moment of inertia and rotational instability of gluon plasma
8 pages, 2 figuresUsing first-principle numerical simulations of the lattice SU(3) gauge theory, we calculate the isothermal moment of inertia of the rigidly rotating gluon plasma. We find that the moment of inertia unexpectedly takes a negative value below the "supervortical temperature" , vanishes at , and becomes a positive quantity at higher temperatures. The negative moment of inertia indicates a thermodynamic instability of rigid rotation. We derive the condition of thermodynamic stability of the vortical plasma and show how it relates to the scale anomaly and the magnetic gluon condensate. The rotational instability of gluon plasma shares a striking similarity with the rotational instabilities of spinning Kerr and Myers-Perry black holes